Search CORE

102 research outputs found

Scalable Planning and Learning for Multiagent POMDPs: Extended Version

Author: Amato Christopher
Oliehoek Frans A.
Publication venue
Publication date: 19/12/2014
Field of study

Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems

arXiv.org e-Print Archive

University of Liverpool Repository

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Association for the Advancement of Artificial Intelligence: AAAI Publications

Best Response Bayesian Reinforcement Learning for Multiagent Systems with State Uncertainty

Author: Amato Christopher
Oliehoek Frans A
Publication venue
Publication date: 01/01/2014
Field of study

It is often assumed that agents in multiagent systems with state uncertainty have full knowledge of the model of dy- namics and sensors, but in many cases this is not feasible. A more realistic assumption is that agents must learn about the environment and other agents while acting. Bayesian methods for reinforcement learning are promising for this type of learning because they allow model uncertainty to be considered explicitly and offer a principled way of dealing with the exploration/exploitation tradeoff. In this paper, we propose a Bayesian RL framework for best response learn- ing in which an agent has uncertainty over the environment and the policies of the other agents. This is a very general model that can incorporate different assumptions about the form of other policies. We seek to maximize performance and learn the appropriate models while acting in an online fashion by using sample-based planning built from power- ful Monte-Carlo tree search methods. We discuss the theo- retical properties of this approach and experimental results show that the learning approaches can significantly increase value when compared to initial models and policies

University of Liverpool Repository

CiteSeerX

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)

Author: Kochenderfer Mykel J.
Oliehoek Frans A.
Robbel Philipp
Publication venue
Publication date: 29/11/2015
Field of study

Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem sizes and types they can handle. We present an approach to mitigate this limitation for certain types of multiagent systems, exploiting a property that can be thought of as "anonymous influence" in the factored MDP. Anonymous influence summarizes joint variable effects efficiently whenever the explicit representation of variable identity in the problem can be avoided. We show how representational benefits from anonymity translate into computational efficiencies, both for general variable elimination in a factor graph but in particular also for the approximate linear programming solution to factored MDPs. The latter allows to scale linear programming to factored MDPs that were previously unsolvable. Our results are shown for the control of a stochastic disease process over a densely connected graph with 50 nodes and 25 agents.Comment: Extended version of AAAI 2016 pape

arXiv.org e-Print Archive

University of Liverpool Repository

Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

Author: Oliehoek Frans A.
Spaan Matthijs T. J.
Witwicki Stefan
Publication venue
Publication date: 20/07/2015
Field of study

Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015

arXiv.org e-Print Archive

University of Liverpool Repository

CiteSeerX

Dec-POMDPs as Non-Observable MDPs

Author: Amato Christopher
Oliehoek Frans A
Publication venue
Publication date: 01/01/2014
Field of study

A recent insight in the field of decentralized partially observable Markov decision processes (Dec-POMDPs) is that it is possible to convert a Dec-POMDP to a non-observable MDP, which is a special case of POMDP. This technical report provides an overview of this reduction and pointers to related literature

University of Liverpool Repository

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information

Author: Oliehoek Frans A.
Roijers Diederik M.
Wiggers Auke J.
Publication venue
Publication date: 01/01/2016
Field of study

Zero-sum stochastic games provide a rich model for competitive decision making. However, under general forms of state uncertainty as considered in the Partially Observable Stochastic Game (POSG), such decision making problems are still not very well understood. This paper makes a contribution to the theory of zero-sum POSGs by characterizing structure in their value function. In particular, we introduce a new formulation of the value function for zs-POSGs as a function of the "plan-time sufficient statistics" (roughly speaking the information distribution in the POSG), which has the potential to enable generalization over such information distributions. We further delineate this generalization capability by proving a structural result on the shape of value function: it exhibits concavity and convexity with respect to appropriately chosen marginals of the statistic space. This result is a key pre-cursor for developing solution methods that may be able to exploit such structure. Finally, we show how these results allow us to reduce a zs-POSG to a "centralized" model with shared observations, thereby transferring results for the latter, narrower class, to games with individual (private) observations

arXiv.org e-Print Archive

University of Liverpool Repository

UvA-DARE

International Migration, Integration and Social Cohesion online publications

On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games

Author: Loftin Robert
Oliehoek Frans A.
Publication venue
Publication date: 23/07/2022
Field of study

Learning to cooperate with other agents is challenging when those agents also possess the ability to adapt to our own behavior. Practical and theoretical approaches to learning in cooperative settings typically assume that other agents' behaviors are stationary, or else make very specific assumptions about other agents' learning processes. The goal of this work is to understand whether we can reliably learn to cooperate with other agents without such restrictive assumptions, which are unlikely to hold in real-world applications. Our main contribution is a set of impossibility results, which show that no learning algorithm can reliably learn to cooperate with all possible adaptive partners in a repeated matrix game, even if that partner is guaranteed to cooperate with some stationary strategy. Motivated by these results, we then discuss potential alternative assumptions which capture the idea that an adaptive partner will only adapt rationally to our behavior.Comment: 9 pages, to be published in The Proceedings of the 39th International Conference on Machine Learning, 202

arXiv.org e-Print Archive

White Rose Research Online

Scalable Planning and Learning for Multiagent POMDPs

Author: AAAI
Amato Christopher
Oliehoek Frans A
Publication venue
Publication date: 01/01/2015
Field of study

University of Liverpool Repository